Novel response element

ABSTRACT

The present invention is directed to a novel Afx response element comprising the nucleotide sequence AACATGTT, said nucleotide sequence having a DNA binding site for the human fork head transkription factor Afx. The invention also relates to the use of the Afx response element in the screening for genes as diabetes drug targets and in the bioinformatic analysis of the human genome, said genes in turn being useful in other screening methods for compounds modifying the insulin receptor signaling pathway. A further aspect of the invention is a vector construct comprising the novel nucleotide sequence, a host cell transformed with said vector construct as well as the fusion protein expressed by said host cell.

TECHNICAL FIELD

[0001] The present invention is directed to a novel Afx response element comprising a DNA binding site for the human fork head transcription factor Afx, as well as to its use in the screening for genes.

BACKGROUND AND PRIOR ART

[0002] Diabetes and obesity are global health problems. Diabetes is the leading cause of blindness, renal failure, and lower limb amputations in adults, as well as the major risk factor for cardiovascular disease and stroke. Normal glucose homeostasis requires the finely tuned orchestration of insulin secretion by pancreatic beta-cells in response to subtle changes in blood glucose levels, delicately balanced with secretion of counter-regulatory hormones such as glucagon.

[0003] Type 1 diabetes or insulin-dependent diabetes mellitus, IDDM, results from autoimmune destruction of pancreatic beta-cells causing insulin deficiency. Type 2 or NIDDM (non-insulin dependent diabetes mellitus) is characterized by a triad of (1) resistance to insulin action on glucose uptake in peripheral tissues, especially skeletal muscle and adipocytes, (2) impaired insulin action to inhibit hepatic glucose production, and (3) dysregulated insulin secretion (R. A. DeFronzo, (1997); Diabetes Reviews, 5, pp. 177-269). After glucose infusion or ingestion (i.e., in the insulin stimulated state), the liver in type 2 diabetic patients overproduces glucose and the muscle glucose uptake is decreased leading to both hyperinsulinemia and hyperglycemia.

[0004] Insulin regulates a wide range of biological processes, including glucose transport, glycogen synthesis, protein synthesis, cell growth, and gene expression. Insulin regulates these processes by altering the concentration of critical proteins or by producing activity-altering modifications of pre-existing enzyme molecules. It is clear that insulin can have both positive and negative effects on the transcription of specific genes (R. M O'Brien, et al. (1996). Gene regulation in Diabetes Mellitus Lippincott-Raven publishers, Philadelphia. pp. 234-242). The genes regulated by insulin encode proteins that have well-established metabolic connection to insulin, but also secretory proteins/hormones, integral membrane proteins, oncogenes, transcription factors, and structural proteins. Not unexpectedly, this type of regulation of gene expression is seen in the primary tissues associated with the metabolic actions of insulin, namely, liver, muscle, and adipose tissue, but also in tissues not commonly associated with these metabolic effects.

[0005] The cis/trans model of trancriptional control can be utilized to understand how insulin regulates gene transcription at the molecular level. The fidelity and frequency of initiation of transcription of eukariotic genes is determined by the interaction of cis-acting DNA elements with trans-acting factors. The specific sequence of the cis-acting element determines which trans-acting factor will bind. Several cis-acting elements that mediate the effect of insulin on gene transcription have recently been defined. These are referred to as insulin response sequences or elements (IRSs/IREs) (R. M. O'Brien, et al. (1996). Gene regulation in Diabetes Mellitus Lippincott-Raven publishers, Philadelphia. pp. 234-242; G. J. P. Kops, et al. (1999). Nature, 398, pp. 630-634; S. Guo et al (1999). J. Biol. Chem. 274, 17184-17192; J. E. Ayala et al. (1999). Diabetes, 48, 1885-1889; and S. K. Durham et al. (1999). Endocrinology, 140, 3140-3146). However, it should be mentioned that to date, there is lack of agreement upon a single insulin response element. Also, that formation of heterodimers between two trans-acting factors can alter their ability to activate transcription, their affinity for DNA or sequence specificity.

[0006] One important question in the study of insulin-regulated gene transcription is how a signal passes from the insulin receptor in the plasma membrane through the cytoplasm and the nuclear membrane to a specific trans-acting factor binding to an IRE. Well-characterised signal transduction mechanisms downstream of the insulin receptor involve cascades of kinase/phosphatase reactions, including, among others, the phosphatidylinositol 3-kinase (PI3K) pathway (P. J. Coffer, et al. (1998), Biochem. J. 335, pp. 1-13, S. Paradis, et al (1998), Genes & Development, 12, pp. 2488-2498, B. B. Kahn (1998); Cell, 92, pp. 593-596). Binding if insulin to its cell surface transmembrane receptor stimulates receptor autophosphorylation and activation of the intrinsic tyrosine kinase activity, which results in phosphorylation of several cytosolic docking proteins called insulin receptor substrates (IRSs). IRSs bind to various effector molecules including the 85 kDa regulatory subunit of PI3K. This localizes the 110 kDa catalytic domain of PI3K to the plasma membrane. The activated PI3K phosphorylates membrane bound phosphoinositides (PtdIns), generating PtdIns(3,4)P2 and PtdIns(3,4,5)P3. These lipids bind to the pleckstrin homology (PH) domain of protein kinase B (PKB, also known as Akt) leading to its accumulation at the cell membrane. The binding causes a conformational change in PKB that makes it more accessible to phosphorylation, which is necessary for its activation. The kinases, which phosphorylate PKB, are themselves targets for lipid products of PI3K and are therefore also localized to the membrane. These kinases are called phosphoinositide-dependent protein kinases (PDK1 and PDK2). Activated PKB dissociates from the membrane and moves to the nucleus and other subcellular compartments.

[0007] The Insulin-Like Pathway in the Nematode Caenorhabditis elegans

[0008] Recent studies in the nematode Caenorhabditis elegans show that a major target of the Akt/PKB homologues, akt-1 and akt-2, is a transcription factor (S. Paradis, et al. (1998); Genes & Development, 12, pp. 2488-2498). An insulin receptor-like signaling pathway regulates C. elegans metabolism, development, and longevity. This pathway is required for reproductive growth and normal metabolism. Mutations in the insulin receptor homologue daf-2 or in the PI3K homologue age-1 cause animals to arrest as dauers, shift metabolism to fat storage, and live longer. This regulation of C. elegans metabolism is similar to the physiological role of mammalian insulin in metabolic regulation. Mutations in the gene daf-16, which encodes a fork head transcription factor that acts downstream of the kinases, suppress the effects of mutations in daf-2 or age-1 (S. Ogg et al. (1997); Nature, 389, pp. 994-999, K Lin et al. (1997); Science, 278, pp. 1319-1322). The principal role of DAF-2/AGE-1 signaling is thus to antagonize DAF-16. Paradis et al. showed further that inactivation of C. elegans Akt/PKB signaling also causes a dauer constitutive phenotype, and that loss-of-function mutations in the Fork head transcription factor DAF-16 relieves the requirement for Akt/PKB signaling to repress dauer formation. This indicates that DAF-16 is a negatively regulated downstream target of Akt/PKB signaling. DAF-16 contains four consensus sites for Akt/PKB phosphorylation, which indicate that the kinase exert the negative regulatory effect by directly phosphorylating DAF-16 and altering its transcriptional regulatory function.

[0009] Human DAF-16 Homologues

[0010] The most closely related proteins, identified so far, to DAF-16 are the human fork head transcription factors Afx, FKHR and FKHRL1. Based on amino acid sequence comparison of their fork head DNA-binding domains, Afx, FKHR, and FKHRL1 share about 60-65% identity with DAF-16 (S. Ogg et al. (1997); Nature, 389, pp. 994-999). Afx shares 83% and 81% identity to the fork head domains of FKHR and FKHRL1, respectively (M. J. Anderson et al. (1998); Genomics, 47, pp.187-199). Although this high homology is confined to the fork head domain, amino acid sequences on either side of this domain show little relatedness. However, there are several amino acid stretches outside the fork head domain that show marked sequence conservation. A N-terminal region of 24 amino acids is 75-83% conserved, and the C-terminal ends of each protein where the transactivation domains are located (J. L. Bennicelli et al. (1995). Oncogene, 11, pp. 119-130 and G. J. P. Kops, et al. (1999). Nature, 398, pp. 630-634) contain several stretches of homology. The genes for the human DAF-16 homologues were first identified at chromosomal breakpoints in human tumours (A. Borkhardt et al. (1997); Oncogene, 14, pp. 195-202; W. J Fredericks, et al. (1995); Molecular and Cellular Biology, 15, pp. 1522-1535, M. J. Anderson et al. (1998); Genomics, 47, pp. 187-199). These tumours were associated with translocation-generated fusion proteins, Afx/mixed-lineage leukemia (MLL) fusion protein in acute leukemias, and PAX3/FKHR fusion protein in alveolar rhabdomyosarcomas. These fork head proteins contain three PKB phosphorylation sites. It has recently been proposed that Afx is a substrate for PKB (S. R. James et al. Recent Res. Devel. Biochem., 1 (1999), pp. 63-76; and G. J. P. Kops, et al. (1999), Nature, 398, pp. 630-634). The phosphorylation of Afx increases after insulin stimulation, and this in terms reduces the activity of the transcription factor. Thus, Afx is negatively regulated by PKB.

[0011] A. Brunet, et al. demonstrated in Cell 96 (1999); pp. 857-868, that PKB also regulates the activity of FKHRL1. In the presence of survival factors, such as insulin-like growth factor 1 (IGF1) and neurotrophins, PKB phosphorylates FKHRL1, leading to FKHRL1's retention in the cytoplasm. Survival factor withdrawal leads to FKHRL1 dephosphorylation, nuclear translocation, and target gene activation.

[0012] It has been shown that Afx can activate insulin response element-driven reporter genes (S. R. James et al. Recent Res. Devel. Biochem., 1 (1999), pp. 63-76; and G. J. P. Kops, et al. (1999); Nature, 398; pp. 630-634). However, it has not been shown if these are the optimal response elements, if Afx, FKHR, and FKHRL1 show identical or similar DNA-binding characteristics, or how specific they are with regard to DNA binding.

[0013] Given a representative sampling of DNA sequences to which a transcription factor will bind, it is possible to generate a specific profile or model which can be applied to identify DNA sequences to which a transcription factor will bind in vitro. Such a model is useful for the identification of genes with a potential binding site for the transcription factor in the promoter, intergenic sequences, or 3′ regions (the introns and sequences which flank the first and last exons). Subset of the found genes can be created, e.g. based on biological knowledge.

THE INVENTION

[0014] The object of the present invention was to find a response element comprising a DNA binding site for the human fork head transcription factor Afx. In accordance with the present invention, a novel Afx response element comprising the nucleotide sequence AACATGTT is hereby provided, said nucleotide sequence having a binding site for the human fork head transkription factor Afx.

[0015] The DNA binding specificity of the fork head protein Afx has in accordance with the present invention been identified. The binding site for Afx is a palindromic sequence, AACATGTT.

[0016] The present invention provides the basis for future computer analysis for the identification of genes that are potentially regulated by the transcription factor Afx. The use of this found response element is thus useful in the screening for genes that may be used as diabetes drug targets, as well as in bioinformatic analysis of the human genome. Thus, the present invention provides a subset of genes transcriptionally responsive to insulin, said transcription responsive element being useful in the construction and development of assays which enable and facilitates the analysis of genes interacting with the cytokine receptor signaling pathways (e.g. the insulin receptor). Genes found in such screening may in turn be useful in additional screening methods for compounds modifying the insulin receptor signaling pathway.

BRIEF DESCRIPTION OF THE DRAWINGS

[0017]FIG. 1 is a schematic presentation of the fusion proteins used in the selection procedure.

[0018]FIG. 2 is a schematic presentation of the selection and amplification cycle procedure to isolate high-affinity DNA binding sites for the transcription factors.

[0019]FIG. 3 is an alignment of the 20 sequences selected for GST/AfxDBD. Each sequence contains 25 nucleotides and at the bottom is shown in bold and italics the conserved core motif and partially conserved flanking nucleotides, respectively.

[0020]FIG. 4 is a summary of selected DNA binding sites for the three fork head proteins Afx, FKHR and FKHRL1. Numbers represent the frequency in percentage for each nucleotide at each position. The number of sequences on which this summary is based is 20, 27 and 10 for Afx, FKHR and FKHRL1, respectively.

[0021]FIG. 5 shows the AfX frequency and weight matrix.

[0022]FIG. 6 is a map over the expression plasmid pGEX-DBD used for expression of the GST fusion protein GST/AfxDBD, GST/FKHRDBD, and GST/FKHRL1DBD.

[0023]FIG. 7 shows the nucleotide sequence encoding GST-AfxDBD; and

[0024]FIG. 8 is the corresponding amino acid sequence for the protein expressed by GST-AfxDBD.

DEFINITIONS AND ABBREVIATIONS

[0025] In order to provide a clear and consistent understanding of the invention, the following definitions are provided.

[0026] BSA: Bovine Serum Albumin

[0027] C-terminal: Carboxy-terminal

[0028] dNTP: Deoxy Nucleotide Triphosphate

[0029] DTT: Dithiothreitol

[0030] EDTA: Ethylenediaminetetraacetic acid

[0031] GST: Glutathione S-Transferase

[0032] HEPES: N-[2-hydroxyethyl]piperazine-N′-[2-ethanesulfonic acid]

[0033] IPTG: Isopropylthiogalactoside, an inducer for the E. Coli lac operon

[0034] MOPS buffer: 3-[N-Morpholino]propanesulfonic acid

[0035] NuPAGE®, from the company Novex/Invitrogen

[0036] PBS: Phosphate buffered saline

[0037] PCR: Polymerase Chain Reaction

[0038] PMSF: Phenylmethylsulfonyl fluoride

[0039] SDS-PAGE: Sodium dodecyl sulfate polyacrylamide gel electrophoresis

[0040] Tris-HCl: Tris(hydroxymethyl)aminomethane

[0041] Triton X-100: t-octylphenoxypolyethoxyethanol

[0042] Tween 20: Polyoxyethylene sorbitan monolaurate

[0043] X-gal: X-galactose

[0044] Plasmid: A cloning vector which is able to replicate autonomously in a host cell, and which is characterized by one or a small number of restriction endonuclease recognition sites. A foreign DNA fragment may be spliced into the plasmid/cloning vector at these sites in order to bring about the replication and cloning of the fragment. The vector may contain a marker suitable for use in the identification of transformed cells. For example, markers may provide tetracycline resistance or ampicillin resistance.

[0045] Expression: Expression is the process by which a polypeptide is produced from DNA. The expression process involves the transcription of the gene into mRNA, and the translation of this mRNA into a polypeptide.

[0046] Expression vector: A vector similar to a cloning vector but which is capable of inducing the expression of the DNA that has been cloned into it, after transformation into a host. The cloned DNA is usually placed under the control of (i.e. operably linked to) certain regulatory sequences such as promoters or enhancers. Promoter sequences may be constitutive, inducible or repressible.

[0047] Host: Any prokaryotic or eukaryotic cell that is the recipient of a replicable expression vector or cloning vector, is the “host” for that vector. The term encompasses prokaryotic or eukaryotic cells that have been engineered to incorporate a desired gene on its chromosome or in its genome. Examples of cells that can serve as hosts are well known in the art, as are techniques for cellular transformation (see e.g. Sambrook et al. Molecular Cloning: A Laboratory Manual, 2^(nd) ed. Cold Spring Harbor (1989)).

[0048] Promoter: A DNA sequence typically found in the 5′ region of a gene, located proximal to the start codon. Transcription is initiated at the promoter. If the promoter is of the inducible type, then the rate of transcription increases in response to an inducing agent.

[0049] Response element: The nucleotide sequence of a cis-acting element located in the promotor region of a gene and involved in transcriptional control. A response element is a short DNA sequence located in the promotor, intergenic sequences, or 3″ regions (the introns, and sequences which flank the first or last exons) of a gene and that is involved in transcriptional control.

[0050] Scoring: For any given sequence as wide as the model, take the corresponding numbers for the observed nucleotide at each position and sum the numbers.

[0051] I. Construction of Plasmids

[0052] DNA binding domain sequences of the human transcription factors Afx, FKHR, and FKHRL1 comprising amino acid residues G86 to A211 (hAfxDBD); L145 to A270 (hFKHRDBD); and G142 to A267 (hFKHRL1DBD) respectively (FIG. 1), were amplified by PCR using primers which included 5′ and 3′ BamHI sites, digested with BamHI and inserted into BamHI digested pGEX-2T-KB (FIG. 6). This vector originated from pGEX-2T (Amersham Pharmacia Biotech) after the introduction of a polylinker (5′GATCTGGTACCGAGCTCGGATCCCCGGG, Scandinavian Gene Synthesis, Sweden) at the BamH1 and EcoR1 sites. The cloning cassette of the resulting pGEX-2T-KB vector contains, in addition to BamH1, EcoR1 and SmaI sites present in pGEX-2T, a KpnI, SacI and AvaI new restriction sites. The DNA binding domains were cloned in frame with the GST-tag in pGEX-2T-KB to produce pGEX-AfxDBD, pGEX-FKHRDBD, and pGEX-FKHRL1DBD. The sequences of the inserted DNA fragments were confirmed by DNA sequencing. Nucleotide sequences of primers used for the PCR amplification were (Scandinavian Gene Synthesis, Sweden): AfxDBD5′, 5′GACGACGGATCCGGGGCTGTAACAGGTCCTC; AfxDBD3′, 5′GACGACGGATCCTCAGGCTTTACTGCGGCCCCG; FKHRDBD5′, 5′GACGACGGATCCCTCGCGGGGCAGCCGCGC; FKHRDBD3′, 5′GACGACGGATCCTCAAGCTCGGCTTCGGCTC; FKHRL1DBD5′, 5′GACGACGGATCCGGGGGCTCCGGGCAGCCG; FKHRL1DBD3′, 5′GACGACGGATCCTCATGCGCGGCCACGGCTCTTG

[0053] II. Expression and Analysis of Recombinant Proteins

[0054]Escherichia coli BL21(DE3) (Novagen) were transformed with pGEX-AfxDBD, pGEX-FKHRDBD, pGEXFKHRL1DBD, and pGEX-2T-KB (control), and transformants were used for inoculation of 20 ml of Luria broth medium (Luria, S. E., and Burrows, J. W. (1957), J. Bacteriol. 74: p. 461-476) containing 100 μg/ml carbenicillin (Sigma) and incubated in shaking flasks at 37° C. overnight. The cultures were diluted into 100 ml of fresh medium to an OD600 of 0.1 and incubated with vigorous shaking at 37° C. Expression was induced by addition of IPTG (final concentration 1 mM) at an OD600 of 0.5-0.6, and the incubation was continued for 2.5 h. Bacteria were harvested by centrifugation at 4,000×g for 15 min at 4° C. and the cell pellets were stored at −70° C.

[0055] Expression was analysed by SDS-polyacrylamide gel electrophoresis followed by staining with Coomassie brilliant blue (Sigma). Aliquots collected before and after IPTG induction were centrifuged at 20,000×g for 10 minutes. Pellets were resuspended in sample buffer containing DTT, and heated at 95° C. for 5 minutes. The samples were loaded on NuPAGE 10% Bis-Tris gels (Novex). Gels were run for 50 minutes at 200 volts in MOPS buffer (Biorad Model 100/500, Power Supply).

[0056] III. Preparation and Analysis of Bacterial Extracts

[0057] Cell pellets from 50 ml culture were thawed on ice before suspension in 2.5 ml TNT buffer (10 mM Tris-HCl, pH 8.0, 1 mM EDTA, 100 mM NaCl, 1% Triton X-100). Bacteria were lysed by addition of 2.5 mg lysozyme (Merck), incubation at 4° C. for 1 h, and sonication with vibra cell high intensity ultrasonic processor (Sonics & Materials) (4×20 s, 50% duty cycle, 3.5 output control). The lysates were cleared by ultracentrifugation at 100,000×g for 1 h at 7° C. DTT and glycerol were added to 2 mM and 15% final concentration, respectively, and the lysates were frozen in 1 ml aliquots at −70° C.

[0058] Aliquots of the bacterial extracts collected before and after the ultracentrifugation were analysed by SDS-polyacrylamide gel electrophoresis, as described above, followed by Coomassie brilliant blue staining, and Western blotting analysis. Proteins were transferred onto a 0.45 μm nitrocellulose membrane (Hybond, ECL, Amersham Pharmacia Biotech) for one hour at 100 volts by using a Novex Western Transfer Apparatus. The membrane was incubated in blocking buffer containing 5% low fat dried milk and 0.1% Tween20 in 1×PBS for 1 h to block nonspecific binding. Before addition of the primary antibody, the membrane was washed 2×5 minutes in washing buffer containing 1×PBS and 0.1% Tween20. Primary antibodies raised against the GST-tag (goat anti-GST antibody, Amersham Pharmacia Biotech) were used at a concentration of 5 μg/ml. The membrane was washed 3×10 minutes with washing buffer before addition of the secondary antibody (rabbit anti-goat antibody, from DAKO). Secondary antibodies were used at a concentration of 0.23 μg/ml. Incubation with primary and secondary antibodies was for one hour at room temperature. Before detection with the ECL-kit (Amersham Pharmacia Biotech), the membrane was washed 4×10 minutes in washing buffer. Equal volumes of detection solution 1 and 2 from the kit were added to the membrane and incubated for one minute. The membrane was placed in a film cassette and Hyperfilm-ECL (Amersham Pharmacia Biotech) was placed on top of the membrane for 3 s. The film was developed in a Curix 60 Agfa.

EXAMPLES

[0059] The invention will now be described in more detail by way of the following examples, which however should not in any way be construed as limiting the invention.

Example 1

[0060] I. Generation of Randomized Oligonucleotides

[0061] Sequence of the random oligonucleotide and primers (Scandinavian Gene Synthesis) used for DNA binding site selection procedure and sequencing: N25: 5′CGCTCGAGGGATCCGAATTC(N)25TCTAGAAAGCTTGTCGACGC; N255′primer: 5′CGCTCGAGGGATCCGAATTC; N253′primer: 5′GCGTCGACAAGCTTTCTAGA.

[0062] To obtain double-stranded oligonucleotides with randomized sequence in the central 25 base pairs as starting material for the selection procedure, 12 μg of N25 was mixed with 10 μg of N253′primer in 100 μl of 10 mM Tris-Cl, pH 7.5, 10 mM MgCl2, 1 mM DTT. The solution was heated to 95° C., followed by slow cooling in a water bath to 55° C., and kept at this temperature for 30 minutes. Following annealing at 55° C., the tube was transferred to 37° C., 10 μl of 10 mM dNTP (10 mM) and 2.5 μl Klenow enzyme (Boehringer, 2 U/μl) were added, and the incubation was continued at 37° C. for 30 minutes. NaCl was added to a final concentration of 0.25 M, and the DNA was precipitated with 300 μl ethanol at −20° C. overnight. The precipitated DNA was recovered by centrifugation, washed with 70% ethanol, recovered by centrifugation, vacuum dried, and finally resuspended in 148 μl dH2O. Successful conversion of N25 to double-stranded oligonucleotides was verified by running an aliquot on a 4% NuSieve agarose gel (FMC, BioProducts).

[0063] II. Selection of Binding Sites

[0064] 148 μl of double-stranded 65-mer was mixed with 10 μl of 2 mg/ml poly(dI-dC)/poly(dI-dC) (Amersham Pharmacia Biotech) and 40 μl of 5×binding buffer (1×binding buffer is 20 mM Hepes, pH 7.9, 50 mM KCl, 2 mM MgCl2, 0.5 mM EDTA, 10% glycerol, 0.1 mg/ml BSA, 2 mM DTT, 0.5 mM PMSF). This solution was divided into two eppendorf tubes and 1 μl of undiluted and {fraction (1/10)}-diluted bacterial extract, was added to each tube, respectively. The bacterial extract was estimated to contain approximately 500 ng fusion protein (SDS-PAGE analysis). This was done for GST/AfxDBD, GST/FKHRDBD, GST/FKHRL1DBD, and GST (control). Following incubation of the binding reactions at room temperature for 10 minutes, 50 μl of a 10% slurry of glutathione-Sepharose in 1×binding buffer (Amersham Pharmacia Biotech) were added and the tubes were flicked gently for 2 minutes to prevent the Sepharose from settling. The glutathione-Sepharose beads with bound protein-DNA complexes were pelleted in a microcentrifuge at 3,000×g for 1 minute. The supernatants were removed and the pellets were resuspended in 1 ml of ice-cold 1×binding buffer, transferred to new tubes, centrifuged, and the supernatants were removed. Three more 1 ml washes were made with the samples transferred to new tubes before the last centrifugation.

[0065] The washed glutathione-Sepharose pellets were resuspended in 50 μl of PCR buffer (10 mM Tris-Cl, pH 8.3, 50 mM KCl, 1.5 mM MgCl2, 0.001% gelatin) and transferred to 0.5-ml microcentrifuge tubes. Fifty microliters amplification mix (PCR buffer containing 0.4 mM dNTP, 2 μM N255′primer, and 2 μM N253′primer) and 0.5 μl Taq polymerase (Boehringer, 5 U/μl) were added and the samples were PCR amplified for 30 cycles (Perkin Elmer, Gene Amp PCR System 2400). Each cycle consisted of a 1 min incubation at 96° C. and a 30 s incubation at 60° C. The final amplification step consisted of a single 30 s incubation at 72° C. A 10 μls aliquot of each selection was analysed on a 4% NuSieve agarose gel to verify the presence of a 65-bp product, and the rest of the PCR reaction was precipitated with 10 μl 5 M NaCl and 250 μl ethanol at −20° C. overnight. The precipitated 65-mer PCR products were recovered by centrifugation, washed with 70% ethanol, recovered by centrifugation, vacuum dried, and finally resuspened in 120 μl dH2O. The solutions were filtered through 0.45 μm Spin-x filters (Costar) to remove the Sepharose beads.

[0066] The second round of selection was identical to the first, except for the following modifications: The binding reaction was set up with 10 μl of the Spin-x filtrate from the first amplification, 64 μl dH2O, 5 μl poly(dI-dC)(poly(dI-dC) (2 mg/ml), 20 μl of 5×binding buffer and 1 μl of bacterial extract as described above for the first round of selection. After the last wash, the Sepharose pellets were resuspended in 100 μl PCR buffer and boiled. Fifty microliters were combined with 50 μl amplification mix. The remaining 50 μl were set aside as backup in case the PCR had to be repeated. The number of cycles in the PCR reaction was decreased to 20.

[0067] The following rounds of selection and amplification were identical to the second round, except that the number of cycles in the PCR reaction were decreased, from 30 cycles in the first amplification to 10 cycles in the sixth and last amplification, as the fraction of high-affinity binding sites in the DNA pool increased.

[0068] After the sixth round of selection and amplification, the PCR products were separated on a 4% NuSieve agarose gel, and purified using the QIAEX II Gel Extraction Kit (QIAGEN). A 10 μl aliquot of the purified PCR products was analysed on a 4% NuSieve agarose gel.

[0069] III. Cloning and Sequencing of Selected Oligonucleotides

[0070] The gel purified PCR products from the last selection and amplification cycle were cloned into the pCR-script SK(+) vector (1) (Stratagene) or the pT7Blue vector (2) (Novagen). E. coli XL1 Blue Ultra Competent cells (Stratagene) and NovaBlue competent cells (Novagen) were transformed with the ligation reactions from 1 and 2 respectively, followed by spreading of the transformation mixtures on Luria Agar (LA) plates containing 100 μg/ml of ampicillin, 80 μM IPTG, and X-gal. For each GST-fusion protein, 20 to 30 white colonies were selected, plasmid-DNA (70 μg/ml) prepared (QUIAGEN, Quiaprep spin), and digested with PvuII (Boeringer Mannheim) followed by analysis on a 4% NuSieve agarose gel. The clones were analysed by DNA sequencing and the central 25 bp of the insert were aligned using Vector NTI Suite, Multiple Sequence Alignment (Informax, USA).

Example 2 Production of GST Fusion Proteins

[0071] The DNA binding domains (DBD) of Afx, FKHR, and FKHRL1, were expressed as GST-fusion proteins. For this, the DBDs of the fork head proteins were inserted into the BamHI site of the pGEX-2T-KB plasmid, to produce pGEX-DBD plasmids. The sequence of the inserted DNA fragments were confirmed by DNA sequencing, except for the C-terminal of FKHRL1DBD. This part was GC-rich, which made it difficult to sequence. Even after several attempts with different sequence analysis the last 15 nucleotides could not be determined. However, even though the C-terminal was not confirmed the plasmid construct was used in following experiments. The GST-fusion proteins and GST alone (control) (see FIG. 1) were expressed in E. coli. A SDS polyacrylamide gel electrophoresis analysis of the expression before and after IPTG induction shows expression of GST-fusion proteins of expected sizes.

[0072] The nucleotide sequence encoding the fusion protein GST-Afx-DBD is shown in FIG. 7, and the corresponding amino acid sequence is shown in FIG. 8.

[0073] Insolubility of recombinant proteins can occur when expressing them in E. coli. The fusion proteins used here had different solubility properties, GST/FKHRDBD, GST/FKHRL1DBD, and GST were soluble, whilst GST-AfxDBD partly existed as inclusion bodies. However, there was still enough GST/AfxDBD in the soluble fraction to successfully complete the selection procedure.

Example 3 Generation of Randomized Oligonucleotide

[0074] To obtain the randomized oligonucleotides used as starting material in the selection of the DNA binding sites, single-stranded oligonucleotides with randomized sequences in the central 25 bases were converted to double-stranded oligonucleotide. The smear of bands observed in agarose gel underneath the 65-mer band presumably reflects partially converted double-stranded oligonucleotides.

[0075] I. Selection of DNA-Binding Sites

[0076] Selection of DNA-binding sites for the fork head proteins, AFX, FKHR, and FKHRL1, was performed according to Pierrou et al. (S. Pierrou et al. (1995); Analytical biochemistry, 229, pp. 99-105), shown in FIG. 2. Binding reactions are set up with bacterial extract containing the GST-fusion protein, and double-stranded oligonucleotides for which the central 25 bp have been randomized. To minimize non-specific protein-DNA interactions, binding is done in the presence of high levels of poly(dI-dC)(poly(dI-dC). The GST fusion protein DNA-bound complex is recovered by the addition of glutathione-Sepharose beads. Following extensive washing of the resin, the bound oligonucleotides are rescued by polymerase chain reaction amplification. The amplified material is used as DNA-pool in the next cycle of selection and amplification. After six cycles the amplified oligonucleotides are cloned and DNA sequence determined.

[0077] The PCR products from each cycle of selection and amplification were analysed by agarose gel electrophoresis. For all of the GST-fusion proteins the gel shows a distinct band of 65 bp. The analysis of the PCR products for the GST-fusion proteins gave similar results after each round of selection. When GST alone (control) is used in the selection procedure, PCR products were found only after the first two rounds of selection, but not after subsequent cycles. This verifies that the oligonucleotides are selected by the fork head DBD moiety of the fusion proteins. Two different concentrations of bacterial extract were used for each fusion protein in the selection and amplification procedure. However the two different concentrations used in these experiments did not show any difference in terms of the amount of amplified oligonucleotides.

[0078] After the sixth round of selection and amplification, the PCR products selected by the various fork head fusion proteins were cloned, and 20-30 colonies from each selection were sequenced. The central 25 bp of the oligonucleotide sequences were aligned and the results for the Afx selection are shown in FIG. 3. A common motif can easily be identified for each fork head protein. The consensus alignment sequence obtained for the Afx transcription factor has a palindrome structure, AACATGTT (FIG. 3). The other two fork head proteins, FKHR and FKHRL1, share the DNA-binding sequence, GTAAA(C/T)A.

[0079] The frequency of the four nucleotides in each position of the binding site was calculated from the aligned sequences (FIG. 4). To make sure that the calculation was based only on high-affinity sites, any sequences in which there were more than one possible match to the consensus motif were excluded. This ensured that oligonucleotides which produced sufficient binding energy to survive the selection procedure through the combined action of several, suboptimal binding sites, rather than a single, high-affinity site did not contribute to the final consensus.

Example 4

[0080] In order to produce a weight matrix representing a group of binding sites for a transcription factor, it is necessary to identify a representative frequency matrix. It is necessary to identify near-optimal alignments for each set of sites sequence displayed in FIGS. 4A, 4B and 4C. Most alignment methods are not designed for the small transcription factor binding sites with highly variable columns present between conserved positions. The Gibbs sampling expectation-maximumization method originally described by C. E. Lawrence et al. (1993); Science, 262 (5131), pp. 208-214, J. W. Fickett (1996); Mol. Cell Biol., 16 (1), pp. 437-44, has been utilized with modifications for DNA sequences introduced by J W Fickett; Mol Cell Biol. January 1996; 16(1):437-41.

[0081] This method determines patterns present in biopolymer sequences which are significantly stronger (more information content in terms of information theory) than random patterns. The program used performs 5 separate searches and reports back the strongest pattern detected. In the case of the Afx data, all 5 searches produced the same pattern. The user must specify the number of instances of the pattern expected, which impacts the output of the program.

[0082] Afx-Specific Notes

[0083] For the Afx weight matrix, the sequences described in FIGS. 3 and 4A are utilized to find 24 sites or pattern instances of width 12 bp. The resulting frequency matrix is as shown in FIG. 5A. After conversion to a weight matrix and the knowledge adaptation procedure, the pattern is as shown in FIG. 5B. Suggested threshold score for sites to consider is 14.0, which is achieved 267 times in 22.2 megabases (22.000.000 basepairs) of genomic sequence.

[0084] FKHR-Specific Notes

[0085] For the FKHR weight matrix, the sequences described in FIG. 4B are utilized to find 35 sites or pattern instances of width 8 bp. After conversion to a weight matrix and the knowledge adaptation procedure, the pattern obtained suggested that threshold score for sites to consider is 12.0, which is achieved 3284 times in 22.2 megabases (22.000.000 basepairs) of genomic sequence.

[0086] FKHRL1-Specific Notes

[0087] For the FKHRL1 weight matrix, the sequences described in FIG. 4C are utilized to find 16 sites or pattern instances of width 7 bp. After conversion to a weight matrix and the knowledge adaptation procedure, the pattern obtained suggested that threshold score for sites to consider is 9.0, which is achieved 7487 times in 22.2 megabases (22.000.000 basepairs) of genomic sequence.

[0088] Searching for Potential Afx Sites

[0089] In order to screen the available genomic sequences for potential Afx sites, the model described above is used, and the range of scores for the sites obtained in the site selection assay is determined. A threshold score of 14.0 is used. The EMBL/GenBank database of sequences for entries with sequences scoring above this threshold was used for screening. By scoring is meant: For any given sequence as wide as the model, take the corresponding numbers for the observed nucleotide at each position and sum the numbers.

[0090] This search produced a number of hits. In order to create a subset of the sites for expert review, the list was narrowed to:

[0091] (1) genomic sequences present in a collection of genes selectively expressed in the liver or in adipocytes; and

[0092] (2) GenBank entries which contained in the title line “promoter” or “enhancer” or “regulatory”.

[0093] After expert curation, this list was narrowed to Table 1 below. TABLE 1 Selected genomic gene sequences for potential Afx sites from transcripts expressed in liver/ adipocytes and the EMBL-990629 sequence release Score Clone/gene 5.44 Mouse M20497 adipose fatty acid binding protein 5.75 hPPARg2 promoter AB005520 7.5 hPCK1 U31519 14.0 hPAC-RPCI4-79 14.2 mouse LPL (exon1) hChr17-HCIT104N19 proenkephalin U09941.1 15.3 hChr20718J7 hOB-gene/exon3 15.4 hBACRG118E13 mouse AC00529 15.5 hNH0576I16 hAldolase reductase 15.7 hα-fetoprotein 15.9 hTyrosine amino transferase 16.0 HIV type-1 enhancer binding protein-2 16.4 hApolipoprotein B-100, and hCOX-2 16.4 hBacRG118E13 (NPY) 16.5 hPacCh14-rpCI4-794B2 18.0 h c-fos 18.2 rat CYP4A1

[0094] This study has identified a novel DNA binding site for the human fork head transcription factor Afx from random sequence oligonucleotides. In FIG. 3 a set of sequences are aligned that have been selected using the GST-AfxDBD protein. A common motif can easily be identified comprising the nucleotide sequence AACATGTT.

[0095] Several studies of DNA target sites for other fork head proteins have been performed. All these studies have a seven base pair recognition core motif, (G/A)(T/C)(C/A)AA(C/T)A, in common, whereas sequences flanking either side do not share any obvious similarities (E Kaufmann et al. (1996); Mechanisms of Development, 57, pp. 3-20). The positions within the binding sites will be referred to relative to the first position of this core, i.e. the (G/A) position. The three adenosines at positions +4, +5, and +7, appear to be critical since they are conserved in all of the earlier studies and also in the selected response element sequences for the three fork head proteins Afx, FKHR, and FKHRL1. During the last years, different insulin response elements (IRE) have been published (R. M. O'Brien, et al. (1996). Gene regulation in Diabetes Mellitus Lippincott-Raven publishers, Philadelphia. pp. 234-242; G. J. P. Kops, et al. (1999). Nature, 398, pp. 630-634; S. Guo et al. (1999). J. Biol. Chem. 274, 17184-17192; J. E. Ayala et al. (1999). Diabetes, 48, 1885-1889; and S. K. Durham et al. (1999). Endocrinology, 140, 3140-3146). One of them is an element that has been identified in the promoter region of several genes repressed by insulin in a PI3K/Akt-dependent manner, such as insulin-like growth factor binding protein-1 (IGFBP-1). This proposed IRE consists of 8 bp, (CAAAAC/TAA). A comparison of this core sequence with the selected consensus sequences for Afx, FKHR, and FKHRL1 reveals that the three adenosines are conserved also in this IRE.

[0096] The selected DNA binding site for Afx differs from the other two fork head proteins. The core sequence appears to be more specific than those for FKHR and FKHRL1, which show more variety among the clones, and it has a palindrome structure that is not observed for the other fork head proteins.

[0097] The novel response element according to the present invention, is useful in the screening for genes as diabetes drug targets, and also for the bioinformatic analysis of the human genome (See Example 4), providing a subset of genes transcriptionally responsive to insulin and also in construction and development of assays that enables and facilitates the analysis of genes interacting with the insulin receptor signaling pathway. Genes found in this screening can in turn then be used in other screening methods for compounds modifying the insulin receptor signaling pathway.

[0098] Flanking sequences have been shown to be important in contributing DNA-binding site specificity, while at the same time they are less well defined than the core. This has been shown more directly for some of the fork head proteins FREAC (S Pierrou et al. (1994) EMBO J., 13, pp. 5002-5012).

1 34 1 24 DNA Artificial Sequence synthetically generated oligonucleotide 1 gccccactcc ataacatgtt gttc 24 2 25 DNA Artificial Sequence synthetically generated oligonucleotide 2 ggccgcggat aacaacatgt tgttg 25 3 25 DNA Artificial Sequence synthetically generated oligonucleotide 3 ggccgactat caacatgttt gcctg 25 4 25 DNA Artificial Sequence synthetically generated oligonucleotide 4 gnaaccgnct gttgtnaaca tgttg 25 5 25 DNA Artificial Sequence synthetically generated oligonucleotide 5 ggacggtagg ggagtaaaca tgttg 25 6 25 DNA Artificial Sequence synthetically generated oligonucleotide 6 ggcgggaggt gtcaacatgt tgtgc 25 7 25 DNA Artificial Sequence synthetically generated oligonucleotide 7 ggcacccgca gtaaacatgt tatgc 25 8 25 DNA Artificial Sequence synthetically generated oligonucleotide 8 ggcatagctc tgtgtaaaca tgttg 25 9 25 DNA Artificial Sequence synthetically generated oligonucleotide 9 ggcaggacgg tacacaaaca tgttg 25 10 25 DNA Artificial Sequence synthetically generated oligonucleotide 10 ggcatcaaca tgtttataat gggtg 25 11 25 DNA Artificial Sequence synthetically generated oligonucleotide 11 ggcagcggta aacatgttgt ctccc 25 12 25 DNA Artificial Sequence synthetically generated oligonucleotide 12 ggcccacggt caacatgttt tgatg 25 13 25 DNA Artificial Sequence synthetically generated oligonucleotide 13 ggcggggact cgggtaaaca tgttg 25 14 25 DNA Artificial Sequence synthetically generated oligonucleotide 14 ggagcataaa catgttgttg gcggc 25 15 25 DNA Artificial Sequence synthetically generated oligonucleotide 15 ggcggtaaac atgttgngat cagng 25 16 25 DNA Artificial Sequence synthetically generated oligonucleotide 16 gggcgggata aacatgttat gctcc 25 17 25 DNA Artificial Sequence synthetically generated oligonucleotide 17 ggtaggcagc acaacatgtt taccc 25 18 25 DNA Artificial Sequence synthetically generated oligonucleotide 18 ggtacatacg gtaaacatgt tgtgc 25 19 25 DNA Artificial Sequence synthetically generated oligonucleotide 19 gggagcccca taacatgttt tcacg 25 20 25 DNA Artificial Sequence synthetically generated oligonucleotide 20 gggcccaggc ataaacatgt tgttg 25 21 15 DNA Artificial Sequence consensus sequence 21 gggtaaacat gttgt 15 22 1073 DNA Artificial Sequence synthetically generated GST-AfxDBD construct 22 atgtccccta tactaggtta ttggaaaatt aagggccttg tgcaacccac tcgacttctt 60 ttggaatatc ttgaagaaaa atatgaagag catttgtatg agcgcgatga aggtgataaa 120 tggcgaaaca aaaagtttga attgggtttg gagtttccca atcttcctta ttatattgat 180 ggtgatgtta aattaacaca gtctatggcc atcatacgtt atatagctga caagcacaac 240 atgttgggtg gttgtccaaa agagcgtgca gagatttcaa tgcttgaagg agcggttttg 300 gatattagat acggtgtttc gagaattgca tatagtaaag actttgaaac tctcaaagtt 360 gattttctta gcaagctacc tgaaatgctg aaaatgttcg aagatcgttt atgtcataaa 420 acatatttaa atggtgatca tgtaacccat cctgacttca tgttgtatga cgctcttgat 480 gttgttttat acatggaccc aatgtgcctg gatgcgttcc caaaattagt ttgttttaaa 540 aaacgtattg aagctatccc acaaattgat aagtacttga aatccagcaa gtatatagca 600 tggcctttgc agggctggca agccacgttt ggtggtggcg accatcctcc aaaatcggat 660 ctggttccgc gtggatctgg taccgagctc ggatccgggg ctgtaacagg tcctcggaag 720 ggaggctccc gccggaatgc ctggggaaat cagtcatatg cagaactcat cagccaggcc 780 attgaaagcg ccccggagaa gcgactgaca cttgcccaga tctacgagtg gatggtccgt 840 actgtaccct acttcaagga caagggtgac agcaacagct cagcaggatg gaagaactcg 900 atccgccaca acctgtccct gcacagcaag ttcatcaagg ttcacaacga ggccaccggc 960 aaaagctctt ggtggatgct gaaccctgag ggaggcaaga gcggcaaagc cccccgccgc 1020 cgggccgcct ccatggatag cagcagaagc tgctccgggg ccgcagtaaa gcc 1073 23 358 PRT Artificial Sequence synthetically generated GST-AfxDBD construct 23 Met Ser Pro Ile Leu Gly Tyr Trp Lys Ile Lys Gly Leu Val Gln Pro 1 5 10 15 Thr Arg Leu Leu Leu Glu Tyr Leu Glu Glu Lys Tyr Glu Glu His Leu 20 25 30 Tyr Glu Arg Asp Glu Gly Asp Lys Trp Arg Asn Lys Lys Phe Glu Leu 35 40 45 Gly Leu Glu Phe Pro Asn Leu Pro Tyr Tyr Ile Asp Gly Asp Val Lys 50 55 60 Leu Thr Gln Ser Met Ala Ile Ile Arg Tyr Ile Ala Asp Lys His Asn 65 70 75 80 Met Leu Gly Gly Cys Pro Lys Glu Arg Ala Glu Ile Ser Met Leu Glu 85 90 95 Gly Ala Val Leu Asp Ile Arg Tyr Gly Val Ser Arg Ile Ala Tyr Ser 100 105 110 Lys Asp Phe Glu Thr Leu Lys Val Asp Phe Leu Ser Lys Leu Pro Glu 115 120 125 Met Leu Lys Met Phe Glu Asp Arg Leu Cys His Lys Thr Tyr Leu Asn 130 135 140 Gly Asp His Val Thr His Pro Asp Phe Met Leu Tyr Asp Ala Leu Asp 145 150 155 160 Val Val Leu Tyr Met Asp Pro Met Cys Leu Asp Ala Phe Pro Lys Leu 165 170 175 Val Cys Phe Lys Lys Arg Ile Glu Ala Ile Pro Gln Ile Asp Lys Tyr 180 185 190 Leu Lys Ser Ser Lys Tyr Ile Ala Trp Pro Leu Gln Gly Trp Gln Ala 195 200 205 Thr Phe Gly Gly Gly Asp His Pro Pro Lys Ser Asp Leu Val Pro Arg 210 215 220 Gly Ser Gly Thr Glu Leu Gly Ser Gly Ala Val Thr Gly Pro Arg Lys 225 230 235 240 Gly Gly Ser Arg Arg Asn Ala Trp Gly Asn Gln Ser Tyr Ala Glu Leu 245 250 255 Ile Ser Gln Ala Ile Glu Ser Ala Pro Glu Lys Arg Leu Thr Leu Ala 260 265 270 Gln Ile Tyr Glu Trp Met Val Arg Thr Val Pro Tyr Phe Lys Asp Lys 275 280 285 Gly Asp Ser Asn Ser Ser Ala Gly Trp Lys Asn Ser Ile Arg His Asn 290 295 300 Leu Ser Leu His Ser Lys Phe Ile Lys Val His Asn Glu Ala Thr Gly 305 310 315 320 Lys Ser Ser Trp Trp Met Leu Asn Pro Glu Gly Gly Lys Ser Gly Lys 325 330 335 Ala Pro Arg Arg Arg Ala Ala Ser Met Asp Ser Ser Ser Lys Leu Leu 340 345 350 Arg Gly Arg Ser Lys Ala 355 24 28 DNA Artificial Sequence polylinker sequence 24 gatctggtac cgagctcgga tccccggg 28 25 31 DNA Artificial Sequence primer for PCR 25 gacgacggat ccggggctgt aacaggtcct c 31 26 33 DNA Artificial Sequence primer for PCR 26 gacgacggat cctcaggctt tactgcggcc ccg 33 27 30 DNA Artificial Sequence primer for PCR 27 gacgacggat ccctcgcggg gcagccgcgc 30 28 31 DNA Artificial Sequence primer for PCR 28 gacgacggat cctcaagctc ggcttcggct c 31 29 30 DNA Artificial Sequence primer for PCR 29 gacgacggat ccgggggctc cgggcagccg 30 30 34 DNA Artificial Sequence primer for PCR 30 gacgacggat cctcatgcgc ggccacggct cttg 34 31 65 DNA Artificial Sequence oligonucleotide for DNA binding site selection 31 cgctcgaggg atccgaattc nnnnnnnnnn nnnnnnnnnn nnnnntctag aaagcttgtc 60 gacgc 65 32 20 DNA Artificial Sequence primer for DNA binding site selection 32 cgctcgaggg atccgaattc 20 33 20 DNA Artificial Sequence primer for DNA binding site selection 33 gcgtcgacaa gctttctaga 20 34 13 DNA Artificial Sequence binding motif 34 gtaaacatgt tgt 13 

1. A nucleotide sequence AACATGTT, said nucleotide sequence comprising a DNA binding site for the human fork head transcription factor Afx.
 2. An Afx response element comprising the nucleotide sequence AACATGTT.
 3. An Afx response element according to claim 2, which is a cytokine response element.
 4. An Afx response element according to claim 3, which is an insulin response element.
 5. A vector construct comprising the nucleotide sequence according to claim
 1. 6. The vector construct according to claim 5, which is pGEX-DBD.
 7. A host cell transformed with the vector construct of claim 5 or
 6. 8. A fusion protein expressed by the host cell of claim
 7. 9. A fusion protein according to claim 8, which is the protein expressed by GST/AfxDBD.
 10. Use of a nucleotide sequence according to claim 1, in the screening for genes.
 11. Use of a nucleotide sequence according to claim 1, in bioinformatic analysis.
 12. A gene identified by the use of a nucleotide sequence according to claim
 1. 